Severity of infection with the SARS-CoV-2 B.1.1.7 lineage among hospitalized COVID-19 patients in Belgium

Introduction The pathogenesis of COVID-19 depends on the interplay between host characteristics, viral characteristics and contextual factors. Here, we compare COVID-19 disease severity between hospitalized patients in Belgium infected with the SARS-CoV-2 variant B.1.1.7 and those infected with previously circulating strains. Methods The study is conducted within a causal framework to study the severity of SARS-CoV-2 variants by merging surveillance registries in Belgium. Infection with SARS-CoV-2 B.1.1.7 (‘exposed’) was compared to infection with previously circulating strains (‘unexposed’) in terms of the manifestation of severe COVID-19, intensive care unit (ICU) admission, or in-hospital mortality. The exposed and unexposed group were matched based on the hospital and the mean ICU occupancy rate during the patient’s hospital stay. Other variables identified as confounders in a Directed Acyclic Graph (DAG) were adjusted for using regression analysis. Sensitivity analyses were performed to assess the influence of selection bias, vaccination rollout, and unmeasured confounding. Results We observed no difference between the exposed and unexposed group in severe COVID-19 disease or in-hospital mortality (RR = 1.15, 95% CI [0.93–1.38] and RR = 0.92, 95% CI [0.62–1.23], respectively). The estimated standardized risk to be admitted in ICU was significantly higher (RR = 1.36, 95% CI [1.03–1.68]) when infected with the B.1.1.7 variant. An age-stratified analysis showed that among the younger age group (≤65 years), the SARS-CoV-2 variant B.1.1.7 was significantly associated with both severe COVID-19 progression and ICU admission. Conclusion This matched observational cohort study did not find an overall increased risk of severe COVID-19 or death associated with B.1.1.7 infection among patients already hospitalized. There was a significant increased risk to be transferred to ICU when infected with the B.1.1.7 variant, especially among the younger age group. However, potential selection biases advocate for more systematic sequencing of samples from hospitalized COVID-19 patients.


Introduction
The pathogenesis of COVID-19 depends on the interplay between host characteristics, viral characteristics and contextual factors. Here, we compare COVID-19 disease severity between hospitalized patients in Belgium infected with the SARS-CoV-2 variant B.1.1.7 and those infected with previously circulating strains.

Methods
The study is conducted within a causal framework to study the severity of SARS-CoV-2 variants by merging surveillance registries in Belgium. Infection with SARS-CoV-2 B.1.1.7 ('exposed') was compared to infection with previously circulating strains ('unexposed') in terms of the manifestation of severe COVID-19, intensive care unit (ICU) admission, or inhospital mortality. The exposed and unexposed group were matched based on the hospital and the mean ICU occupancy rate during the patient's hospital stay. Other variables identified as confounders in a Directed Acyclic Graph (DAG) were adjusted for using regression analysis. Sensitivity analyses were performed to assess the influence of selection bias, vaccination rollout, and unmeasured confounding.

Results
We observed no difference between the exposed and unexposed group in severe COVID-19 disease or in-hospital mortality (RR = 1. 15 Introduction to preserve patient privacy. The protocol of the LINK-VACC project was approved by the medical ethics committee University Hospital Brussels -Vrije Universiteit Brussel (VUB) on 03/02/2021 (reference number 2020/523) and obtained authorization from the Information Security Committee (ISC) Social Security and Health (reference number IVC/KSZG/21/034). The data of the individual data sources (Clinical Hospital Survey, Vaccinnet+, COVID-19 TestResult Database, and StatBel) within the LINK-VACC project are kept in the pseudonymized environment of healthdata.be and a link between the individual data in each of them takes place thanks to the use of a pseudonymized national reference number managed by healthdata.be under a "project mandate". A "project mandate" consists of a group of individuals, a group of variables and a time period. Access rights to the pseudonymized data in the healthdata.be data warehouse are granted ad nominatum for the scientists involved in the surveillance activities at Sciensano. External investigators with a request for selected data should send a proposal to covacsurv@sciensano. be. Depending on the type of desired data (anonymous or pseudonymized) the provision of data will have to be assessed by the Belgian Information Security Committee Social Security & Health based on legal and ethical regulations, and is outlined in a data transfer agreement with the data owner (Sciensano). [42][43][44][45]. Indeed, the concern had been raised that B.1.1.7 also causes more severe disease compared to previously circulating strains by the UK New and Emerging Respiratory Virus Threats (NERVTAG) group in January 2021 [46]. However, studies investigating the association between B.1.1.7 and disease severity were often inconclusive or had conflicting results. The updated NERVTAG report [47] underlined the potential limitations of used datasets in terms of representativeness, power, potential biases in case ascertainment, selection, unmeasured confounders, and secular trends. Continuous genomic surveillance enables the detection of emerging genetic variants. Information on the estimated risk of a new variant causing more or less severe illness can assist clinicians to make prognoses. Moreover, it is important information for policy makers to issue guidelines, control transmission, and prepare the healthcare system by safeguarding healthcare capacity. Van Goethem et al [48] presented a conceptual framework to study the effect of SARS-CoV-2 variants on the severity of COVID-19 disease in hospitalized patients and described how the causal effect of variants may be estimated from data that is gathered in Belgium in the context of routine COVID-19 surveillance systems. In this study, we apply this framework to examine the effect of the B.1.1.7 lineage on disease severity among hospitalized COVID-19 patients.

Materials and methods
The study is conducted within a causal conceptual framework to assess the effect of SARS-CoV-2 variants on COVID-19 disease severity among hospitalized patients [48]. The study protocol has been registered on Open Science Framework (OSF) prior to data analysis (registration date July, 16th 2021, DOI: 10.17605/OSF.IO/ZG3DJ). This manuscript is reported according to the STROBE guidelines [49].

Data sources
Sciensano, the Belgian national institute for health, has initiated the LINK-VACC project, which allows linking of selected variables from existing COVID-19 registries through the national registry number, including data on hospitalized COVID-19 patients from the Clinical Hospital Survey (CHS) [50], laboratory test results (polymerase chain reaction (PCR) tests, rapid antigen tests, and sequencing information) from the COVID-19 TestResult Database [51], administered COVID-19 vaccines from the national vaccine registry (Vaccinnet+), and socio-economic information from the Belgian Statistical Office (StatBel). Data on the hospital bed occupancy was derived from the Surge Capacity Survey (SCS) [50]. Details on the different data sources and its use within the proposed conceptual framework have been described elsewhere [48].

Study population
The study population consists of hospitalized COVID-19 patients who were admitted in a Belgian hospital from 31 st August 2020 onwards and for whom an admission form was reported in the CHS up to August 9 th 2021. The analysis was restricted to those with a laboratory-confirmed COVID-19 infection (RT-PCR and/or rapid antigen test). Patients that were transferred, readmitted, or hospitalized in a hospital without an intensive care unit (ICU) were excluded. Patients admitted during the first wave (i.e., admitted before August 31 st 2020) were excluded and the study period corresponds to the second and third wave of the COVID-19 epidemic in Belgium. Indeed, protocols, treatment regimens and professional experience of healthcare personnel have substantially changed between the first and second wave, and are considered to be more comparable between the second and subsequent waves.

Exposure
Infection with the SARS-CoV-2 VOC B.1.1.7 ("alpha-variant"; exposed group) was compared to infection with previously circulating SARS-CoV-2 strains (unexposed group). Exposure to B.1.1.7 was identified through WGS (i.e., confirmed B.1.1.7 samples) obtained from both baseline and active genomic surveillance, and the subsequent registration of the Pangolin lineage [52] B.1.1.7 in the COVID-19 TestResult Database. As such, the exposed group consisted of hospitalized COVID-19 patients with an admission form registered in the CHS and identified as being infected with the B.1.1.7 variant through linkage with the COVID-19 TestResult Database based on the national registry number. Patients of whom the sample was compatible with a known VOC, as obtained through presumptive genotyping without WGS confirmation, were not considered for the current analysis. To assure that the hospital admission was related to the detected infection with the B.1.1.7 variant, patients with a sample collected more than 14 days before hospital admission or collected after hospital discharge were excluded. The unexposed group consists of COVID-19 patients with an admission form registered in the CHS and diagnosed and admitted to the hospital before December 1 st 2020, therefore considered to be infected with previously circulating SARS-CoV-2 strains. According to GISAID's EpiCoV database, the first identified B.1.1.7 variant in Belgium dates back to November 30 th 2020 (sample date). Therefore, it is highly unlikely that patients hospitalized before December 1 st 2020 were infected with the B.1.1.7 variant.

Study design
The study is an observational multi-center matched cohort study where COVID-19 hospitalized patients are followed-up from hospital admission until death or hospital discharge and for whom information was obtained by merging different national surveillance systems based on the national registry number. The unexposed group was matched to the exposed group based on the hospital and the mean ICU occupancy rate during the hospital stay of the patient in order to assure similar levels of care between both exposure groups, as an oversaturated ICU was previously shown to impact in-hospital mortality [22].

Outcome
The primary outcome among the hospitalized study population is the development of severe COVID-19 defined as the presence of either ICU admission, ARDS, or in-hospital death. ICUadmission and in-hospital mortality have also been analyzed as two secondary outcomes.

Confounding
The conceptual framework as described by Van Goethem et al [48] used Directed Acyclic Graphs (DAGs) to represent the assumptions and limitations for estimating the causal effect of SARS-CoV-2 variants on disease severity by means of observational data gathered from routine COVID-19 surveillance systems in Belgium. Several potential confounders of the variantseverity relationship have been identified within the conceptual framework [48] and should be adjusted for to estimate a causal effect. The variables identified as direct confounders in the DAG were adjusted for using regression analysis whereas the indirect confounders (hospital and ICU occupancy rate) were adjusted for using matching. and on the average ICU bed occupancy rate (defined as the number of COVID-19 ICU patients in the hospital divided by the hospital's number of recognized ICU beds) during their hospital stay. Exact matching with a rounding to 5% of the ICU occupancy rate was used, as this resulted in the least loss of subjects while maintaining comparable levels of care between exposed and unexposed matches. Demographic and clinical information of the matched study population was presented per exposure status.
Twenty-fold multiple imputation of missing values was performed using the mice package [54] for all covariates (see Table 1) used in the multivariable model (see further) and for all outcomes. Binary, categorical and numerical variables were imputed with logistic regression, multinomial regression and predictive mean matching, respectively. The primary outcome, disease severity, is an indicator based on three original variables and was passively imputed and not used as predictor for missing values on its components. The imputation was performed using thirty iterations to achieve good convergence of the MCMC and the visit sequence was set from low to high proportion of missing data.
Regression standardization [55] was done using a weighted logistic model (using matching weights) with the following covariates: SARS-CoV-2 variant, age, gender, ethnicity, comorbidities (cardiovascular disease, hypertension, solid cancer, hematological cancer, chronic lung disease, chronic kidney disease, chronic liver disease, chronic neurological disease, cognitive disorder, diabetes, obesity, immunocompromised), place of infection (community, hospital, nursing home), socio-economic variables (education level at the individual level, and population density and median taxable income in the postcode of residence), vaccination status at diagnosis (no vaccination, partially vaccinated, fully vaccinated), and two-way interactions of these variables with the SARS-CoV-2 variant. Numeric variables were entered in the model with linear and quadratic terms. The causal effect was estimated with a relative risk (RR) and a risk difference (RD). Block bootstrapping [56] of matched pairs (B = 1000 replications) was done on each imputed dataset [57] to estimate the variance on each parameter of interest. Pooled point estimates and confidence intervals were then obtained using Rubin's rules for multiple imputation [58].
A stratified analysis according to age group (�65 and >65 years old) was performed and considered as an exploratory analysis as it has not been pre-specified in the protocol.
All analyses were conducted in R 4.0.1 [59].

Sensitivity analyses
A first sensitivity analysis was performed including only WGS results obtained from baseline unbiased surveillance (i.e., without active selection of specific patient groups as explained in detail in the causal framework of Van Goethem et al [48]). A second sensitivity analysis was performed including only patients that had not received a first vaccination dose before their COVID-19 diagnosis. The same modeling procedure as above was conducted on these two populations. Thirdly, robustness of the results to potential unmeasured or uncontrolled confounding and selection bias was assessed using the EValue package and summarized using the multi-bias E-value [60,61]. The E-value is defined as the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates [60].

Assessment of selection bias
Potential selection bias was assessed by comparing baseline characteristics and outcomes between patients with and without available SARS-CoV-2 variant information (confirmed, i.e.  [38], this comparison helps to assess whether there is a difference in profiles of patients of whom samples were or were not sequenced (e.g., due to a higher viral load or preferential sequencing).

Ethics and data protection authorization
Ethical approval was granted for the gathering of data from hospitalized patients by the Committee for Medical Ethics from the Ghent University Hospital (reference number BC-07507) and authorization for possible individual data linkage using the national register number from the Information Security Committee (ISC) Social Security and Health (reference number IVC/ KSZG/20/384). Linkage of hospitalized patient data to vaccination, testing, sequencing and socioeconomic data within the LINK-VACC project was approved by the medical ethics committee UZ Brussels-VUB on 03/02/2021 (reference number 2020/523) and obtained authorization from the ISC Social Security and Health (reference number IVC/KSZG/21/034). Inform consent was waved based on art 6 and 9 of the GDPR. The collection is allowed based on general interest (art. 6 GDPR) and regarding article 9 § 2of the GDPR: processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy.

Basic descriptive characteristics of the matched study population
As recorded on August 9 th 2021, the CHS database contained a total of 73,370 case records of COVID-19 patients, of which admission forms were received for 67,948 patients (Fig 1). After exclusion of patients not meeting inclusion criteria, a total of 35,558 hospitalized COVID-19

PLOS ONE
patients were recorded as admitted after August 31 st 2020 and 17,642 (49.6%) of them had available exposure information. These were either identified as having a confirmed B.1.1.7 infection (n = 523; exposed) upon linkage with the COVID-19 TestResult Database, or classified as unexposed (n = 17,119), meaning that they were diagnosed and admitted before December 1 st 2020 and consequently considered as being infected with previously circulating SARS-CoV-2 strains. From the 523 patients with a confirmed B.1.1.7 infection (exposed), 500 could be matched to 3,419 patients infected with previously circulating strains (unexposed) based on the hospital and the mean ICU occupancy rate rounded to 5% and a total of 3,919 cases were thus included in the descriptive analysis.

Sensitivity analyses
A first sensitivity analysis assessed whether only including samples sequenced within the context of baseline (i.e., without active) surveillance would influence the results. S1  Table, are similar compared to the main analysis results.  The second sensitivity analysis excluded patients that had received at least one vaccination dose before their COVID-19 diagnosis, in order to account for the impact of the vaccination rollout between the exposed and unexposed group. S2 Fig shows a flow chart for selection of patients that did not receive a vaccination dose before their COVID-19 diagnosis. From the 419 patients with a confirmed B.1.1.7 infection and no vaccination dose received before diagnosis, 405 could be matched to 2,881 patients infected with previously circulating variants. The causal effect estimates within this subgroup, as presented in S2 Table, are similar compared to the main analysis results.

Outcome Risk a (in %) [95% CI] RR [95% CI] RD (in %) [95% CI]
The E-value and multi-bias E-value were calculated to assess the influence of selection bias (e.g., based on the viral load) and/or unmeasured confounding (e.g., genetic profile of the patient) on the observed RR for each of the outcomes (S3 Table). The observed significant RR of 1.36 for ICU admission could be explained by an unmeasured confounder that was associated with both the exposure (SARS-CoV-2 variant) and ICU admission by a RR of 2.06-fold each, above and beyond the measured confounders, but weaker confounding could not do so; the confidence interval could be moved to include the null by an unmeasured confounder that was associated with both the exposure and ICU admission by a RR of 1.21-fold each, above and beyond the measured confounders, but weaker confounding could not do so. The same applies to selection on a variable with associations to both exposure and ICU transfer of at least 2.06 (1.21 for the 95% CI). A multi-bias E-value of 1.60 was obtained when considering both unmeasured confounding and selection bias simultaneously. This means that an unmeasured confounder with an association on the RR-scale of at least 1.60 to both exposure and outcome and selection on a variable with an association on the RR-scale of at least 1.60 to both exposure and outcome could explain the observed effect (above and beyond the variables that were controlled for in the model).

Selection bias
Selection bias was assessed by comparing the differences between patients of whom the SARS-CoV-2 positive sample was or was not selected for WGS analysis. Patients of whom the sample was compatible with a known VOC, as obtained through presumptive genotyping without WGS confirmation, were excluded. From the 9,599 patients with an available admission form registered in the CHS, meeting the inclusion criteria, and admitted in the hospital after March 1 st 2021, 672 (7%) had a sample with a confirmed Pangolin lineage. About half of those sequencing results (53%; 357/672) were obtained through baseline surveillance. S4 Table compares patients with variant information (obtained through baseline WGS surveillance) to patients without SARS-CoV-2 variant information. Patients for whom baseline WGS surveillance was performed were more frequently males, nursing home residents, immunocompromised, fully vaccinated, admitted to a university hospital, and contracted their infection more frequently within the hospital. Moreover, these patients were more frequently transferred to ICU as compared to patients without available sequence information. When stratifying per hospital type, patients in general hospitals with viral sequence data were more frequently admitted into ICU as compared to patients without viral sequence data (20.9%; 95% CI [15.9%-26.8%] and 13.7%, 95% CI [12.8%-14.5%], respectively), whereas this difference was not observed among patients admitted to general hospitals with university characteristics or university hospitals.

Discussion
This study aimed to assess the effect of the SARS-CoV-2 VOC B.1.1.7 (also labeled as alphavariant) on disease severity among hospitalized COVID-19 patients within an existing causal framework [48] using linked data from routine COVID-19 surveillance systems in Belgium. We observed no significant difference in severe COVID-19 disease or in-hospital mortality by SARS-CoV-2 lineage (B.1.1.7 versus non-sequenced previously circulating variants) in an adjusted analysis (RR = 1.15, 95% CI [0.93-1.38] and RR = 0.92, 95% CI [0.62-1.23], respectively). This is in line with the findings from Frampton et al [40] where no association was found between B.1.1.7 infection and severe disease or death within a hospitalized cohort. On the other hand, community-based studies revealed an increased risk of overall mortality associated with B.1.1.7 in people testing positive for COVID-19 [62][63][64][65]. These findings may suggest that the effect of B.1.1.7 is different in a hospitalized cohort than in the general population and does not exclude an increased risk of hospital admission with the B.1.1.7 lineage [64]. Indeed, a Danish [66] and two UK [67,68] studies suggested that infection with lineage B.1.1.7 was associated with an increased risk of hospitalization compared with that of other circulating strains or the wild-type variant. As such, it is possible that the B.1.1.7 variant has an increased risk of hospitalization, but that there is no additive risk of mortality once hospitalized [40,64,69]. However, restricting the analysis to hospitalized patients may induce collider bias [70,71]. Among hospitalized patients, the relationships between any variables that relate to hospitalization will be distorted compared to the relationships that exist among the general population [70]. As such, the identified associations within the hospitalized population may not reflect the patterns in the general population (i.e., lack of external validity) [71].
The estimated standardized risk to be admitted in ICU was significantly higher (RR = 1.36, 95% CI [1.03-1.68]) in the patients when infected with the B.1.1.7 variant. This is in line with the findings from a community-based study by Patone et al who reported that people infected with lineage B.1.1.7 had double the risk of admission to ICU compared to those infected with non-B.1.1.7 SARS-CoV-2 [64]. However, we should be aware that selection bias could potentially invalidate our causal inference estimates [61]. Here, we observed that patients with variant information available differ from patients of whom the samples were not selected for WGS analysis. As such, 22% of hospitalized patients with available sequencing results were transferred to ICU, whereas this was only the case for 16% of hospitalized patients without information on the SARS-CoV-2 lineage of their infection. This could in part be explained by the fact that patients with available sequence information were more often admitted to a university hospital where the proportion of ICU transfers is higher. However, given our matched cohort design, the type of hospital is perfectly balanced between the exposed and unexposed group and should not result in confounding. Furthermore, the model also matches patients based on levels of ICU occupancy, as patients may less likely be admitted when ICU capacity is oversaturated. Still, selection bias may arise when the samples from ICU patients are preferentially selected for WGS. Indeed, if a nonrandom selection of samples for WGS based on the severity of disease or ICU admission occurs, this may partially explain why we observed a higher standardized risk for ICU admission for patients with a confirmed B.1.1.7 infection compared to patients without available sequencing results that were considered to be infected with previously circulating strains. However, a sensitivity analysis considering only sequencing results obtained through baseline (unbiased) surveillance provided similar results. Another potential source of bias is the fact that only samples with a sufficiently high viral load (�10 3 −10 4 RNA copies/mL) can be sequenced due to technical limitations. This could bias our conclusions, as a higher viral load can be associated with severe disease [42]. However, the viral load also depends on the stage in which the patient is sampled (time of sampling) and the underlying conditions of the patients. Here, the robustness of our obtained causal inference estimates to potential uncontrolled confounding, such as the viral load, was assessed using the E-value [60]. If both the association between viral load and exposure (i.e., SARS-CoV-2 variant) and the association between viral load and ICU transfer, is at least 2.06 on the risk ratio scale (conditional on the other included covariates), this could completely nullify the observed causal estimate (RR = 1.36, 95% CI [1.03-1.68]) to be admitted in ICU. This relatively large Evalue implies that considerable unmeasured or uncontrolled confounding would be needed to explain away our obtained effect estimate.
Our exploratory analyses revealed important differences in the risk for severe COVID-19 and ICU admission associated with the B.1.1.7 variant according to age. We did observe an increased risk of severe COVID-19 related to the B.1.1.7 variant among the younger age group (�65 years), whereas severity seemed to be independent of the SARS-CoV-2 variant among the older age group (>65 years). This is line with an analysis based on data from seven EU countries that also suggests a higher risk for hospitalization and ICU admission in age groups <60 years for B.1.1.7, whereas this was not the case for the older age groups [72]. One hypothesis to explain these observations is that the B.1.1.7 variant causes a higher viral load [40] as compared to previously circulating variants, but that the positive correlation between viral load and disease severity is only observed in younger patients. Indeed, it has been shown that respiratory viral loads were generally correlated with inflammatory responses in younger patients, but less correlated with those in older patients [73].
Within the current study, the exposed and unexposed group are completely separated in time. As a limitation, the unexposed in the analysis did not have information (obtained through WGS) on the SARS-CoV-2 variant of their infection. They were defined as being infected with 'previously circulating strains' as they were diagnosed and admitted in the hospital before December 1 st 2020, i.e., before the circulation of any VOC in Belgium. However, we cannot rule out the possibility that a patient was hospitalized in Belgium after being infected with a VOC abroad. As large-scale COVID-19 genomic surveillance was initiated when B.1.1.7 became dominant in Belgium, there were insufficient sequenced non-VOC samples from patients hospitalized after December 1 st 2020 to facilitate comparisons. Given the different time periods and the non-randomized observational study design, the exposed and unexposed groups considerably differ in terms of patient characteristics and contextual factors. The profile of hospitalized patients may change over time according to the demography of the viral circulation. Indeed, the patients in the exposed group were younger (in line with Frampton et al [40]), which may also explain the differences in distributions of comorbidities, illness severity, and presenting symptoms at admissions between both groups. Given the different time periods, there may be an impact of the vaccination rollout in Belgium which started in early 2021 and targeted in priority the nursing home residents, healthcare workers, and people with comorbidities. However, a sensitivity analysis excluding the vaccinated patients provided similar results. Further, although there were no apparent changes in national or regional policies, there may exist differences in indications for hospitalization of COVID-19 patients between the two time periods related to the number of available beds and medical personnel. However, we believe that matching the exposure groups based on the mean ICU occupancy rate (calculated as the number of COVID-19 patients occupying the recognized ICU beds within the hospital in which the patient was admitted and averaged over the patient's hospital stay) controlled well for the risk of hospital or ICU admission related to organizational characteristics. In addition, matching on the hospital enables to account for between-hospital differences in admission criteria and levels of care. Moreover, the decision-making process to admit COVID-19 patients may also be influenced by individual patient characteristics such as age. Therefore, a major strength of the current study in general is the ability to control for an extensive list of potential confounders (i.e., patient characteristics and contextual factors that differ between the two time periods) given the detailed patient information that is collected within the CHS and the linkage to other data sources. For instance, our ability to control for the mean ICU occupancy rate is an important strength given previous observations that mortality is affected by how many patients require intensive care in a hospital setting [22,74]. As a limitation, we missed information on the staff to patient ratio and could not take into account the number of newly created ICU beds per hospital. Also, there may exist other time-dependent factors for which we are unable to adjust. This will in general always be an issue, as different emerging variants will become dominant consecutively over time and as there is often only a short period in which two variants co-circulate and can be directly compared. Also, in order to study the clinical impact of variants within the current framework based on linking routine COVID-19 registries, one variant may need to dominate a previous one before a sufficiently large sample size is reached. This has implications for the timeliness of the results for guiding policy making.
The limitations that we encountered with regard to potential selection biases and a sample size that depends on the linkage of existing COVID-19 registries, emphasizes the need for more systematic sequencing of samples from hospitalized COVID-19 patients. A major focus of the current genomic surveillance program is on detecting new emerging variants and flagging specific events, such as break-through cases, re-infections, and geographic dynamics by monitoring returning travelers [75]. However, a detailed analysis of the association between SARS-CoV-2 variants and disease severity requires a sufficiently large and representative sample. This could be achieved by a better alignment of the different stakeholders. For example, sequencing capacity could be efficiently redistributed by performing random or exhaustive sequencing of COVID-19 samples from hospitalized patients. This would optimize linking of multiple independent data sources in settings where this is required. Further, the indication for sequencing (i.e., baseline versus active surveillance of severe patients) should be well documented by the laboratories when reporting data in order to avoid selection biases.

Conclusions
In this observational multi-center matched cohort study, we observed that among patients already hospitalized, no increased risk of severe COVID-19 infection or death associated with B.1.1.7 infection was found compared to previously circulating SARS-CoV-2 strains. Within an age-stratified analysis we did observe that among the � 65 age group the risk for severe COVID-19 was higher among patients when infected with the B.1.1.7 variant, whereas severity was independent of the SARS-CoV-2 variant among the older age group (>65 year). Although we should take into account the risk of non-random selection of samples for WGS, we did observe an overall association with B1.1.7 infection and ICU admission. While at the moment of writing the delta-variant has completely dominated the B.1.1.7 variant [76], this analysis may still provide useful scientific information for future comparisons with new emerging variants. Performing real-time and unbiased assessments of the severity related to emerging SARS-CoV-2 variants should be foreseen in the future. Systematic screening of samples from hospitalized COVID-19 patients is needed to avoid potential biases. analysis within a multi-center matched cohort study to assess the impact of SARS-CoV-2 variants on COVID-19 disease severity among hospitalized patients in Belgium. (TIF) S1 Table. Risk per exposure group (in %), Relative Risk (RR) and Risk Difference (RD, in %) estimates and 95% Confidence Interval (CI) for main and secondary outcomes when only considering Whole-Genome Sequencing (WGS) results obtained through baseline surveillance. Results (overall and stratified per age group) for a sensitivity analysis within a multi-center matched cohort study to assess the impact of SARS-CoV-2 variants on COVID-19 disease severity among hospitalized patients in Belgium. (DOCX) S2 Table. Risk per exposure group (in %), Relative Risk (RR) and Risk Difference (RD, in %) estimates and 95% Confidence Interval (CI) for main and secondary outcomes when excluding patients that had received a first vaccination dose before their COVID-19 diagnosis. Results (overall and stratified per age group) for a sensitivity analysis within a multi-center matched cohort study to assess the impact of SARS-CoV-2 variants on COVID-19 disease severity among hospitalized patients in Belgium. (DOCX) S3 Table. Sensitivity analysis using the E-value. Sensitivity analysis within a multi-center matched cohort study to assess the impact of SARS-CoV-2 variants on COVID-19 disease severity among hospitalized patients in Belgium. (DOCX) S4